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ABSTRACT 

Understanding the behaviors of information propagation is essen- 
tial for the effective exploitation of social influence in social net- 
works. However, few existing influence models are tractable and 
efficient for describing the information propagation process, espe- 
cially when dealing with the difficulty of incorporating the effects 
of combined influences from multiple nodes. To this end, in this pa- 
per, we provide a social influence model that alleviates this obstacle 
based on electrical circuit theory. This model vastly improves the 
efficiency of measuring the influence strength between any pair of 
nodes, and can be used to interpret the real-world influence prop- 
agation process in a coherent way. In addition, this circuit the- 
ory model provides a natural solution to the social influence max- 
imization problem. When applied to real-world data, the circuit 
theory model consistently outperforms the state-of-the-art methods 
and can greatly alleviate the computation burden of the influence 
maximization problem. 

1. INTRODUCTION 

A social network is a graph of relationships between individuals, 
groups, or organizations. As an effective tool in connecting people 
and spreading information, social networks have caught the atten- 
tion of millions of small businesses as well as major companies. To 
exploit social networks to gain influence, it is essential to under- 
stand the behaviors of information propagation in social networks. 
However, given the large-scale of existing social networks, it is 
challenging to design a tractable and efficient model to describe the 
information propagation process in social networks. For instance, 
let us consider a scenario that an event (such as a rumor) initially 
happened on a node (denoted as the seed node), this event may hap- 
pen again on some neighboring nodes and may further happen on 
some far-away nodes due to the information propagation through 
the network. Then, the question is how does the information prop- 
agate? Also, after sufficient propagation, what probability will this 
event happen on a random node? (this probability could be viewed 
as the influence strength from the seed node to this random node.) 

Indeed, there are several existing models to describe the above 
information propagation process, such as the Linear Threshold(LT) 
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model [ 10 1 the Independent Cascade(IC) model 1 8 ]. However, these 
models are operational models, they are untractable and inefficient. 
Under these models, if we want to get the probability of one event 
happened on a node, we need to run Monte-Carlo simulations of 
these influence models for sufficiently many times (e.g. 20000 
times) to obtain an accurate estimate of the probability for the event 
happen |5|, and this approach is very time-consuming. Moreover, 
if the event initially happened on more than one node (seed), the 
probability of this event happened on a random node will be a com- 
bination of influences from all of these seeds. In this case, how to 
get the probability for the event happen and how to identify the in- 
dependent probability (or independent influence) of each seed are 
challenging research issues. 

To this end, in this paper, we propose a social influence model 
based on electrical circuit theory to simulate the information prop- 
agation process in social networks. With this model, we are able 
to obtain a probabilistic influence matrix to describe the influence 
strength between any pair of nodes. Also, based on this model, 
we provide a novel method to identify the independent influence of 
each seed node when there are more than one seed. This can lead to 
a natural solution to the social influence maximization problem 1111 
I51 I21I . which targets on finding a small subset of influential nodes 
in a social network to influence as many nodes as possible. Specifi- 
cally, the contributions of this paper can be summarized as follows. 

• We propose the idea of exploiting the circuit network to sim- 
ulate the information (or influence) propagation process in 
social networks. This simulation is straightforward and is 
easy to interpret and understand. To the best of our knowl- 
edge, this is the first work that exploits circuit theory for es- 
timating the spread of social influence. What's more, this 
model is tractable even for more than one seed node. With 
this model, we could provide a novel way to compute the 
independent influence of a seed naturally. 

• We identify an upper bound of the node's influence on the 
network. This upper bound property can help us to select 
the truly influential nodes and drastically reduce the search 
space. By exploiting these computational properties, we de- 
velop an efficient approximation method to compute the in- 
fluences and successfully apply this circuit theory based model 
to solve the social influence maximization problem. Along 
this line, we first prove that the influence spread function 
is also submodular under the circuit model. Then, by the 
greedy strategy, we design an algorithm to select K influen- 
tial nodes to maximize their influence. Finally, experimental 
results show that the circuit theory based methods have the 
advantages over the state-of-the-art algorithms for the social 
influence maximization problem in terms of both efficiency 
as well as the effectiveness. 



2. A SOCIAL INFLUENCE MODEL BASED 
ON CIRCUIT THEORY 

In this section, we propose a social influence model based on 
electrical circuit theory. 

2.1 Social Influence Modeling 

Social influence refers to the behavioral change of individuals 
affected by others in a network. Social influence is an intuitive and 
well-accepted phenomenon in social networks J7). In this section, 
we will propose a quantitative definition of social influence to mea- 
sure its amount. 

Let G = (V, E) is an information network 0, where the node set 
V includes all of individuals, the arc set E represents all the social 
connections. Let F^j denote the influence from node ( to node j, 
we propose three rules to define the influence between any pair of 
nodes. 

1 . The influence from oneself should be 1 , that is F,^, = 1 . 

2. The influence could transmit through the network arcs with 
certain probability. 

3. The influence on one node should be related to the influences 
on the node's neighbors, suppose node j's neighbor set is 
Nj = {juji, -im} (i-e. VA: e N Jt (kj) e E), then 

Fi-*,j = fj(tj l jFj-,j l , tj 2 jFi^>j 2 , ... tj m jFi-,j m ) (1) 

where ttj is the transmission probability on arc (k, j). 

To the construction modeled by the above three rules, there are 
two factors could change its shape. The first one is the transmission 
probabilities on arcs. In this paper, we propose an assumption to 
confine the probability, that is 

Assumption 1. The sum of transmission probabilities flowing 
into one node should be less than or equal to 1. That is, 

n 

6j = ^ tji < 1 for i = l,2...n 

or 

T'e = © < e (2) 

where = diag(0i, 62, ...#„) and T = [?,;]„«„ is a transmission ma- 
trix in which tij is the transmission probability from node i to node 
j. lf{j,i)tA,thent ji =0. 

Actually, this assumption is used for measuring the amount of in- 
formation (e.g., with regard to an event or message) that will be ac- 
cepted by each node. The corresponding value varies in the range 
of [0,1], where stands for the ignorance of the message and 1 
means this node totally believes in it. Notably, in the last section 
of this paper, we also discuss the way how to break through the 
confinement of this assumption. 

The second one is the way how the influence on one node is 
related to the influences on its neighbors. In the previous works, 
Aggarawal et al Q] proposed a way to describe the relation among 
the neighbors' influence, that is 

F M = l-n^.fl-fyFj^) (3) 

'In an information network, the information will flow along with 
the direction of an arc. Any network could be an information net- 
work. For example, twitter's information network should be its 
inverse, for the information flows from node j to node i on arc (i, j) 
in twitter. 



which claims that the transmitted influence from different neigh- 
bors should be independent to each other. This way is a theoret- 
ically reasonable way but it is too complex to get its closed-form 
solution. So, in this paper, we propose a linear method to define it. 
That is, 

F ^i = YTj.Yj tk ' F ^ k f^i* 1 ^ 

j keNj 

where Aj locates in the range (0, +00) which is a damping coeffi- 
cient of node j for the influence propagation. The smaller the Aj 
is (i.e., approaching to 0), the less the information will be blocked 
by node i. This number varies with respect to the topic that the 
propagating information belongs to. If node i favors the topic of 
the propagating information, Aj should approach to 0, otherwise, it 
should approach to +00. Thus, this way could be topic sensitive 
when the damping coefficients on each node are decided according 
to their favorite. 

2.2 A Physical Implication For Linear Model 

To the linear way proposed in Equation[4] there is a physical im- 
plication when the information network G is undirected. We find 
that when we construct a circuit network G' by the following way, 
the current flow in the circuit is running in the same way with the 
information propagation described by Equation [4] and the poten- 
tial value on each node is an equivalent value to the influence on 
it. For an undirected network G in Figure [T] (a), we construct its 
corresponding circuit network G' in Figure[T](b) as follows. 




(a) A sample social network. (b) The circuit network. 



Figure 1: An Illustration Example. 

• First, construct a topologically isomorphic circuit network of 
G, where the conductance between node i and j is equal to 
Cij and guarantees that 2i = -ji where dj = 2"=i Cy! 

• Second, connect each node i with an external electrode £, 
through an additional electric conductor with conductance 
(i+A,-e,)d, e [ ectr j c potential value on Ej is (|+ |''_ fl ) where 
v, is a real number which will be decided later. 

In G', Kirchhoff equations 1131 tell us that the total current flowing 
into any node j should sum up to zero, thus in the circuit network 
G': 

, - (l+Aj-9i)di v, 
/, = > c kj (U k - Uj) + — ( Uj) = 

k= 1 J j j 

where 2 Ij is the total current flowing into node j and Uj is the 
electric potential value on node j. From Equation [5] we have 

1 " v d 

d j + -. *=i 1 

1 +a j k=i 



In this Equation, if we decide vj by the following way 
( a number to guarantee (7, = 1 j = i 

Ho j*> <7) 

then comparing with Equation [4] it is easy to get 
Uj = F^j 

which means that the potential value on node j is an equivalent 
value to the influence F^j. 

Notably, when j t i, y, = 0, then the potential value on Ej will 
be equal to also, all of these node could be viewed as ground 
nodes, thus this circuit network can work well. 

2.3 Influence Matrix 

In light of the circuit network, we amend our linear model as 
follows 

F MJ = V fty^U + V;) for j = 1, 2, 3, ...n (8) 

1 + X > mlj 

where vj is a correction to guarantee that, when j = i, F^j = 1. 
Thus, the value of v, could be determined as 

{a number to guarantee F,-^ , / = ;' 
(9) 
j + i 

Equation[8]could be rewritten as 

F, = (/ + A)- 1 (7"F,+v) 

where 

Fj = [F i ^i,F i ^2,---F i ^„]' 

v = [0,0,. ..v; *0,...0] r 
A = diag(A 1 ,A 2 ,.../L n ) 

which could be solved as 



Fj = a + A-rr'v 
= r _1 v = Pv 



(10) 

(ii) 



where the transpose of (/ + A - T') is strictly diagonally dominant, 
thus it is invertible, and we denote (/ + A - T') as T = \jij\ nm and 
denote its inverse as P = [pjf\ n , n . For v is a vector with only one 
nonzero entry v,, thus 

Ft = v,P., (12) 
and based on Rule 1, F;^ ; should be equal to 1, there is 

Vtptt = 1 

thus, 

1 



Vi 



(13) 



Similarly, we could get 



F i = v j p -.i = —Pj for j = 1, 2, 3, ..n (14) 

Pjj 



This Equation could be rewritten as 

F = [FuF 2 ,...FnY 



= [—P.U—P.2,...—PJ 

Pi I P22 Pm 

= diag(Py'P' 

~ [fijin*n 



Because the entries of matrix F describe the influences between 
any pairs of nodes (in F, fij = F^j), in this paper we call F as 
influence matrix and call F, as node i's influence vector. 

2.3.1 The Computation of F. 

Because F = diag(Py l P' , the computation of F is actually to 
get P = (I + A - T'Y 1 . Because (/ + A - T') is a strictly diag- 
onally dominant matrix which satisfies the convergence condition 
of Gauss-Seidel method, its inverse P could be computed by a very 
fast way through a Gauss-Seidel iteration process. 

Because 

(7 + A - T')P,, = e„ 

where P. t could be viewed as the variables of this linear system 
of equations. For (/ + A - T') is strictly diagonally dominant, 
P j could be solved by Gauss-Seidel method. Specifically, Gauss- 
Seidel method is an iterative method which is operated as the fol- 
lowing procedures: 



1. Set pf = Ofor j = l,2,...n; 



2. pf l) = ^-(e y -2/> ;W lf J -2i</Wr i 0.fori = 1.2,...n; 

3. continue Step 2 until the changes made by an iteration are 
below a given tolerance. 

This procedures is efficient. To get P.; within a valid tolerance 
range, it often need only dozens of iterations. Thus, the time com- 
plexity of computing P j is 0(|£|) and the time complexity of F is 
OdKHFI). Moreover, because the computation of P could be parti- 
tioned to n parts, we could compute P inn parts in parallel. 

Even though, the computation of F is still too consuming to be 
applied to a large scale network. For real networks, it is not nec- 
essary to compute the influence vectors of each nodes since many 
of them is nobody in the network. In the following section, we will 
propose a fast method to estimate which nodes are important to the 
network. 

2.3.2 The Upper Bound of Node's Influence On a 
Network 

Suppose the influence vector of node i is F, = [fa, fa, —fin]' , 
then its total influence on the network should be 2" = i fj an d we 
denote this number as ft and this number could be viewed as the 
importance of node i in the network. 

On the other hand, we denote P t = F.,e = 2;=i Pfi- Then, we 
could get the following property 

Property 1 . 



Proof. For 



T,<(1+ Xj)9j 



" 1 " 1 

1 X' ttX'' r r 



V 



Pa 



(16) 



(17) 



For P(I + A — T'), the dot product ofi-th row of P and i-th column 
of (I + A - T') should be 1, that is 



YjPud + A-T'h = 1 



(15) 



where (I + A - T') u = (1 + Aj) and (I + A - T'\ = -t u for any 1 1 i. 
Then, this equation is equivalent to p;;(l + Aj) - 2"=i Pnhi = 1 ■ 
Thus, we have p ti (\ + Aj) = 1 + 2" =] m Putu > 1. And then, 

— < (1 + Aj) 



With Equation ] 17\ there i 



□ 



In Equation [T6j the quantity Pj is a value that could be got by fast 
method. Let's denote P = [Pi,P 2 , ...P n Y, for P t = P.,e, there is 
P = P'e which could be rewritten as 

(/ + A - T)P = e 

This is a linear system of equations which satisfy the convergence 
condition of Gauss-Seidel method. As the variables of this system, 
P could be computed in 0(|F|) time by the similar way described in 
Section [2. 3. II Thus, thanks to Property [T] we could get the upper 
bound of nodes' importance in 0(|F|) time. These upper bounds 
could help us to quickly identify those less-importance nodes and 
skip them. What's more, Ti is a very compact upper bound to esti- 
mate the importance of node i and we will demonstrates it in Sec- 
tion[5] 

3. THE INFLUENCE FROM A NODE SET 

In real applications, we may need to compute the influence from 
a node set S to the nodes of the network (Consistent with the nota- 
tion Fj_,/, we denote the influence from set 5 to node j as F s ^j). 
In (9), Goyal et al assumed that the influences of different nodes 
are independent of each other. Hence, the joint influence F s ^,j can 
be defined as 



1 



n< 



1 - F^j) 



However, the influences from different nodes are obviously not mu- 
tually independent. Then, how can we get the independent influ- 
ence of a node when it belongs to a node set S ? 

3.1 Independent Influence Inference 

To simplify the discussion, we will first turn to solve an equiva- 
lent problem defined as follows. 

Problem 1. Given a seed set S and a node k i S , how to evalu- 
ate the independent influence Fr] . (independent from the nodes of 
S)? 



For this problem, if we could work out M 

•(.s+m-is)) 



then we could work 



out F 



for any s e S in the same way. Then, for the event 
source set 5' = 5 + {k}, we could work out the independent influ- 
ence for any element in 5", and 



i -n (i -^'7 i " )) 



(18) 



Because Fj_»j = fj, for ease of writing and consistent with the pre- 
vious notation, in the following text, we will use fij to denote F,-_»j, 
use fs'j to denote Fj<^ ; and use jf?' to denote F^.. By Equa- 
tion [TS] the happening probability f s >j can be computed. Thus, 
Problem[T]is the basis for computing the influence from a node set. 
In order to get the independent influence f£.\ let us think about 
the essence of independent influence in light of the circuit perspec- 
tive. We could summarize the independent influence into two con- 
ditions: 

1 . Each node in 5 will never be influenced by node k; 

2. Each node in 5 will never propagate the influence derived 
from node k. 



Because the nodes in 5 are the ones on which event e has already 
happened, they do not need to be influenced by k and cannot be 
influenced. Therefore, the above condition 1 needs to be satisfied. 
On the other hand, because each node in S itself will spread the 
influence on the network about event e, it will block the same type 
of influence from k. Therefore, we assume nodes in 5 will never 
propagate the influence from k. 

Based on the above two conditions, we could construct a circuit 
network G'(S,k) to model the independent influence from node k 
to the other nodes as follows: 

1 . First, construct the circuit network G' ; 

2. Put an electric pole with voltage value on each node in 5 ; 

.IS) 

3. Set the voltage value on external electrode E k to ■ 



4. Set the voltage value on external electrode F,(; J= k) to 0; 

For the graph in FigureQJa), suppose the seed set 5 =(3,8) and k = 
4, then the corresponding G'(S, 4) is showed in Figure|2] From this 
figure, we can see that the above two conditions about independent 
influence are satisfied naturally. 




Figure 2: The circuit network corresponding to Figure 1 (a). 

3.2 The Deduction of Independent Influence 

In the previous subsection, we have constructed the circuit net- 
work for simulating the independent influence. In this subsection, 
we first analyze the potential value on each node which could be 
viewed as the influence on it. Then, we provide several properties 
of independent influence and the efficient way for computing it. 

The Electric Potential on Each Node for G'(S,k). Based on 
the similar discussion as shown in subsection |2.3l we know that the 
electric potential on node j (except for S ) is 



1 



1 +Aj 



+ vf } ) for; ^5. 



where 



a number to make Uj = l j = k 

otherwise 



(19) 



(20) 



Because Ui = 0, for / € 5 , then Equation[T9]can be rewritten as 

(l+A J )U J -Y i ' J iUi = v J S) for j iS. (21) 

its 

Without loss of generality, we can set 5 = {n — \S \ + 1, n — \S ] + 
2, ...n} (i.e., the last \S \ nodes in G'), then Equation[2T|can be rewrit- 



ten as(l + Xj)Uj-J^ w t j ilJi = vf', for ; = 1, 2 n- which 

is equivalent to 



TjzUj = vj (22) 

where Uj = [U u U 2 , ...U n -\s\] T , vj = [vi, v 2 , ...v, Hi -|] r and Tjj is 
a matrix which is cut down from T by removing the columns and 
rows from n — \S | + 1 to n. 



7u 



712 



•Tl(n-|5|) 



/Y(n-\S\)1 7(n-\S\)2 y&i-|S|)(ll-|S|) 

and then, by Equation [22] we have 



(23) 



For only the k-th element of v (-S) is nonzero, Uj is equal to the 



multiplication of v 



US) 



and the k-th column of T-. 

ss 



Based on Equation [20] we know that the number of vl 5) should 



guarantee that U k 
1. Then, 



And we could get 



1 . By Equation[23] that is ( Uj) k 



k 

XSO/T-1 



(.VI 

v k 



(T^ 

(T±) kk 



(rzi)« 



o 



7=1,2,. 



otherwise 



■15 1 



(24) 



(25) 



Based on the previous discussion, the potential on node j could be 
viewed as the independent influence from node k to node j ( S . 
Thus, there is 



Theorem 1 . 



AS) 
Jkj 







y=l,2,...n-|S| 
otherwise 



(26) 



Theorem[T]for = 1,2, ,..n - |5| can be rewritten as 

. The Properties of Independent Influence. We have two proper- 
ties about the independent influence as follows. 



Property 2. 



filp - hi- 



Property 3. If define T\ 
a+hWk- 



US) 



I^F* 1 fg\ then ?f> <r k < 



Property [2] shows that cannot be greater than f kj . If 5 is not 
empty, the influence from node k will be diluted by the influence 
from nodes in S. Property [3] provides the upper bound of 
which can help us reduce the unnecessary computation for those 
nodes with small independent influence in the network. 

The Computation of Independent Influence. Based on Theo- 
rem[T] given the seed set S , if we want to compute the independent 
influence of node k, we just need to compute the k-th column of 
r_L. Because Tjj is a strictly diagonally dominant matrix, it satis- 
fies the convergence condition of Gauss-Seidel method. And from 
rjj-r^i = /, there is r^jT-L = e^. Taking the similar proce- 

S S S S -k 

dures in Section 12.3.1 1 we could compute the A>fh column of _£_L 
in OQE\) time. 



4. SOCIAL INFLUENCE MAXIMIZATION 

In this section, we show how to exploit the circuit model for 
handling the social influence maximization problem, which targets 
on finding a set of seeds in a way that these seeds will influence the 
maximal number of individuals in the network. 

4.1 Problem Formulation and Algorithm 

The social influence maximization problem can be formalized as 

n 

S = argmax 5c|/ ^ ^s->j subject to \S\ = K 
j=i 

where Fs^j follows the meaning in Equation [T8] which is the in- 
fluence from 5 to node j. And in the following text, we denote 
(Z/=i Fs^j) as cr(S ) which is the influence spread function of S (i.e. 
the number of individuals will be influenced by 5). 

Properties of cr(S). Under the circuit model, we could prove 
that the influence spread function cr(S) is a submodular function. 
This shows in the following theorem. 

Theorem 2. For all the seed sets, S i £ 52 £ V and node s, it 
holds that 



CT(S , U {s}) - CT(S > 0-(S 2 U {s}) - CT(S 2 ) 

Proof. For o~(S) = Y!}=\ Fs^j = Y!'j=ifsj, and then, 

tr(SuM) = ^" =i (l-(l-/ S7 .)(l-/^)) 

thus 

o-(S u M) - o-(S) = }^ (/. s (I - A » 
For S i £ 52, it is easy to get that 



(27) 



f s j 1 ^ ff, 2 > fso S fs 2 j 



and then, 



which is, 



□ 



cr(S i U [s )) - o-(S i ) > cr(S 2 U j s )) - tr(5 2 ) 



For simplicity, we denote cr(S U {s}) - cr(S ) = 5? 5 in the follow- 
ing text. From the proof of Theorem |i] gf = Z"=i(/!f(l ~ fsj)X 
specifically, 5s' = P>> Thus, Equation |27] could be rewritten as 

8? > 8f 2) - 

Proposed Algorithm. By a greedy strategy framework, we al- 
ways choose the node which can produce the most increment on 
influence spread when adding it into 5 . The greedy algorithm starts 
with an empty set S o = 0, and iteratively, in step k, adds the node 
s k which maximizes the increment on influence spread into S k -i 

s k = argmax ren5V]0 f 

In this process, there is 

Property 4. Pf™ > gf o) > 5?° > g? 2) ... > %f K \ 

Based on these properties, we design an algorithm to solve the 
social influence maximization problem, as shown in Algorithm Q] 
In this algorithm, we use g, to store the upper bound of the node 



Algorithm 1: Circuit(G, K, A, T) 
input :G(V,A,C),K,A 
output: 5 

Compu te V = [Pu^-'Pn]' = (7 + A - T) _1 e (see it in 

Section l2~3~2l ; 

for each vertex i in G do 

L g; = (i + W-; 

while |5 1 <K do 

re-arrange the order of vertex to make g, > g,- + i; 
%max = 0; 

for int i = \ton-\S\ do 
if g; > %max then 
//update gf > 

g; = (/ pdatelncrement(i, S,A,T,G,F S ); 
if 5, > g mav then 

gmoj: oi! 

|_ i = i; 

else 

|_ break; 

for i = 1 to n - \S\ do 

L fst = 1 - (1 - /s,)(l - ,/f )//update F s 
5^5 UM; 
_ 8f, «- 0; 
return 5 ; 



fs authority which can be used to indicate whether or not the node 
is an important node, and use g„ MV to store the current maximum 
increment. For example, in step k, suppose the current maximum 
increment is g„ ra . v , and when we run into a node j, if its increment in 
last step g' s *~" < gmojr, then we do not need to compute its current 

increment g^ since it is impossible for g^ St) to be larger than 
its previous increment g' s *~ 1 , otherwise, we first update g| SM) to 
gj 5 '', if g? is greater than g„ MV , we assign i to a store variable 
s to keep it and assign g^ St) to g mM ; at last, the value in s will be 
the expected s k and we add it into the seed set 5 . In this process, 
we keep a subprocedure to compute F$ = [fsi,fs2, ■■■ fsn] which 
will be used in the update of gf* in Function |UpdateIncrement| 
In addition, in every step (i.e., for finding a seed), we need to re- 
arrange the order of node to make g; > g,- + i which can also help to 
further reduce the computational cost. Function |UpdateIncrement| 
is used to compute the influence/independent-influence of node and 
return the (z, 5)- Authority of node i. 



Function Updatelncrement(i, 5 , A, T, G, F$ ) 
input : i, S , A, T, G, F s 
output: g, 

Compute the d-th column of r_i by the methods in Section [X2l 

and then get / ; . based on Equationl26l 

for int j = 1 to n - \S | do 

//Based on the discussion in Theorem[2] 

_ g.+ = /,f(i-/s,); 

return g ; ; 



5. EXPERIMENTAL RESULTS 

Here, we evaluate the performances of the Circuit method on 
several real-world social networks. Specifically, we demonstrate: 
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Figure 4: The computational performances. 



(a) the scalability and the ability to maximize the influence prop- 
agation comparing to benchmark algorithms; (b) the impact of the 
damping coefficient A; and (c) the effectiveness of the upper bounds. 

5.1 The Experimental Setup 

Experimental Data. Four real-world networks have been used 
in the experiments. The first one is a Wikipedia voting network in 
which nodes represent wikipedia users and a directed edge from 
node i to node j represents that user i voted on user j, the net- 
work contains all the Wikipedia voting data from the inception of 
Wikipedia till lanuary 2008 Q. The second one, denoted as ca- 
HepPhQ, is a collaboration network which is from the e-print arXiv, 
and it covers scientific collaborations between authors whose pa- 
pers have been submitted to High Energy Physics - Phenomenology 
category. The third one is an even larger collaboration network, 
the DBLP Computer Science Bibliography Database, which is the 
same as in |4). The fourth one is another large directed network col- 
lected by crawling the Amazon website. It is based on Customers 
Who Bought This Item Also Bought feature of the Amazon website , 
where if a product i is frequently co-purchased with product j, then 
a directed edge from ( to j will be added Q. We chose these net- 
works since they can cover a variety of networks with sizes ranging 
from 103K edges to 2M edges and include two directed networks 
and two undirected networks. Some basic statistics about these net- 
works are shown in Table Q] 



Table 1: Statistics of five selected real-world networks. 



Networks 


Wiki-Vote 


ca-HepPh 


DBLP 


Amazon 


#Node 


7,1 15 


12.008 


655K 


262 K 


#Edge/Arc 


103,689 


237,010 


2.0M 


1.2M 


Type 


directed 


undirected 


undirected 


directed 



Benchmark Algorithms. The benchmark algorithms are as fol- 
lows. First, Circuit is our Algorithm [TJ where each node's influ- 
ence/ independent-influence will be computed as the average result 
of 10 iterations and we set damping coefficient matrix A = AI and 
A ranges in (0, 1)Q CELF(20000) is the original greedy algorithm 
with the CELF optimization of (B), where R = 20000. PMIA(0) 

2 http://snap.stanford.edu/data/wiki- Vote.html 

3 http://snap.stanford.edu/data/ca-HepPh.html 

4 http://snap.stanford.edu/data/amazon0302.html 

5 The number of A should be larger than and smaller than 1, and 

the reason is omitted for the limited space. 
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(a) Wiki-Vote. (b) ca-HepPh. (c) DBLP. (d) Amazon. 



Figure 3: The results of influence spread on five benchmark network datasets. 



is the algorithm proposed in |4). We used the source code pro- 
vided by the authors, and set the parameters to the ones produces 
the best results^. In the PageRank algorithm [18], we selected top- 
K nodes with the highest pagerank value. DegreeDiscountIC (5) 
measures the degree discount heuristic with a propagation proba- 
bility of p = 0.01, which is the same as used in |5|. Finally, the 
Degree method captures the top-K nodes with the highest degree. 
Among these algorithms, Degree, DegreeDiscountIC and pageR- 
ank are widely used for baselines. To the best of our knowledge, 
CELF and PMIA are two of the best existing algorithms in terms of 
solving the influence maximization problem (concerning the trade- 
off between effectiveness and efficiency). 

Measurement. The effectiveness of the algorithms for the so- 
cial influence maximization problem is justified by the number of 
nodes that will be activated by the seed set of the algorithm. This 
is called influence spread. To obtain the influence spread of these 
algorithms, for each seed set, we run the Monte-Carlo simulation 
under the Weighted Cascade (WC) model Q 20000 times to find 
how many nodes can be influenced^, and then use these influence 
spreads to compare the effectiveness of these algorithms. Since the 
CELF algorithm is very time-consuming, and DBLP and Amazon 
networks are too large for it to handle. Thus, we just get the exper- 
imental results for CELF on three comparatively smaller networks. 

Experimental Platform. The experiments were performed on a 
server with 2.0GHz Quad-Core Intel Xeon E5410 and 8G memory. 

5.2 A performance comparison 

In the following, we present a performance comparison of both 
effectiveness and efficiency between Circuit and the benchmarks. 
For the purpose of comparison, we record the best performance 
of each algorithm by tuning their parameters. We run tests on the 
five networks under the WC model to obtain the results of influence 
spread. The seed set size K ranges from 1 to 50. Figure|3]shows the 
final results of influence spread. For easy to read, we paint tokens 
at each 5 points. Figure [4] shows the computational performance 
comparison for selecting 50 seeds. 

In Figure [3(a)] we can observe that CELF, Circuit and PMIA are 

6 Based on the source code from its author, the parameter would be 
selected from { 1/10,1/20,1/40,1/80,1/160,1/320,1/1280} 
7 Other models are also introduced in 1111 . but due to the space 
limitation, we focus on the WC model, which is an important case 
of the Independent Cascade(IC) model. What's more, we can prove 
that the amounts of the IC models could be reformed as the WC 
model. The proof is also included in our full technical report. 
8 In detail, under the WC model, the node in the seed set propagates 
its influence through the following operations. Let us view the node 
in the seed set S as the node activated at time t = 0, if node i is 
activated at time /, then it will activate its not-yet-activated neighbor 
node j at time t + 1 (and only time / + 1) with the probability c -f- . 



three best algorithms for Wiki-Vote. PageRank performs well at the 
beginning but deteriorate when the seed set size is large (e.g., larger 
than 35). DegreeDiscountIC and Degree perform worse than oth- 
ers and DegreeDiscountIC performs a little bit better than Degree. 
Similar results can also be observed from the rest networks. 

In summary, in most cases, Circuit and CELF perform much bet- 
ter than other methods. We believe there are two reasons. First, for 
these algorithms, their focus on dealing with the independent influ- 
ence are very different. Degree and PageRank do not care whether 
the node's influence is independent from other nodes or not. While 
DegreeDiscountIC and PMIA try to remove the influence that is 
diluted by other nodes, their methods are too simple to produce 
the real independent influence. In contrast, by a reasonable and 
tractable method, Circuit is able to compute each node's indepen- 
dent influence, and please note that CELF may also get each node's 
real independent influence if we set the parameter R to an unlim- 
ited number. Thus, we can say that the more attention the algorithm 
pays on the independent influence, the better results the algorithm 
may get. Second, the structure of the networks are quite different 
(e.g. clustering coefficient), and these differences may also lead to 
different performances for each algorithm. For example, we find 
that Degree performs well for the networks where the cluster coef- 
ficient are comparably small (in such networks, the most influential 
nodes often belong to different clusters, and their influence will al- 
most independent from each other, so Degree can work well). 

Figure [4] shows the running time comparison for selecting 50 
seeds on the five networks. We can observe that the order of the al- 
gorithms in running time is Degree > DegreeDiscountIC > PageR- 
ank > PMIA > Circuit > CELF, where ">" means "is more efficient 
than". We can see that CELF is not scalable to large networks with 
million edges; while Circuit is scalable, and it takes less than one 
hour to run on the DBLP network with 2.0M edges. By comparing 
Figure |3]and Figure |4] we can observe that there are some kind of 
correlations between the effectiveness and efficiency of each algo- 
rithm. Thus, the differences in running time of these algorithms 
should also mainly lie in their solutions to independent influence. 
Generally speaking, the more attention they pay on the independent 
influence, the more complex the algorithm, and more time will take 
for running the results. 

Table 2: Detailed Comparison Between Circuit and CELF. 



Network 


Effectiveness 


Speedup 


Search Ratio 


Wiki-Vote 


0.996 


1578 


0.037 


ca-HepPh 


1.006 


4303 


0.035 



A Comparison with CELF. Here, we provide a more detailed 
comparison between Circuit and CELF, the best two algorithms in 
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(c) Wiki-Vote. (d) ca-HepPh. 



Figure 5: The variation of effectiveness and the running time of 
Circuit as the change of A on two network datasets: Wiki-Vote, 
and ca-HepPh. The top row of figures show the variation of 
effectiveness and the bottom row of figures show the variation 
of the running time. 

term of effectiveness. Specifically, we compare their average per- 
formances on the three networks, and the results are shown in Ta- 
ble^ where the performance of CELF is chosen as the baseline. In 
this table, "Effectiveness" is the ratio of Circuit's influence spread 
result relative to CELF's, "Speedup" is the ratio of Circuit's run- 
ning time relative to CELF's and "Search Ratio" is the ratio of the 
number of nodes which have been investigated by Circuit relative 
to the number of nodes investigated by CELF. Similar to Figure[3] 
we can see that the Circuit's effectiveness is very close to CELF in 
three moderate networks: 0.4% worse than CELF in Wiki-Vote but 
0.6% better than CELF in ca-HepPh. For efficiency, Circuit outper- 
forms CELF significantly, in these three networks, it is 1578, 835, 
and 4303 times faster than CELF respectively. With respect to the 
number of search nodes, CELF needs to search all nodes in the net- 
work, while Circuit just need to search less than 5% of nodes due 
to the effect of Property [T] 

Summary. Generally, for solving the social influence maxi- 
mization problem, Circuit and CELF perform consistently well on 
each network, but CELF is not scalable to large networks. Though 
PMIA and PageRank are two good scalable heuristic algorithms, 
their performances are not stable for many networks (e.g., DBLP 
and Amazon in this paper), and similar results can be also observed 
for both Degree and DegreeDiscountlC. 

5.3 The Impact of a 

We investigate the effect of tuning parameter A on the running 
time of Circuit and the result of its influence spread. Specifically, 
we set A ranges from 0.05 to 1, step by 0.05, and then get the corre- 
sponding influence spread and running time. And, for a clear view 
of the influence spread results, we use the ratio of their influence 
spread result relative to CELF's to indicate their effectiveness. 

Figures [5(a)||5(b)| show the effectiveness of Circuit with different 
A on Wiki-Vote, ca-HepPh network respectively. In these figures, 



the x axis is the A value; the red long dash line is y = 1 which 
indicates the results of CELF; the last red bar at the right side is 
the effectiveness of the best algorithm among PMIA, PageRank, 
DegreeDiscountlC, Degree. From these figures, we can obtain the 
following observations: 

• The performance of of Circuit is very stable. No matter what 
value A is, the difference of effectiveness is less than 0.04, 
and for most of A values, the effectiveness of Circuit is larger 
than 0.97 (against CELF) and better than the best one among 
PMIA, PageRank, DegreeDiscountlC and Degree. 

• When the A value increases from to 1, the effectiveness of 
Circuit ascends at the beginning and then descends starting 
from a certain point. This observation could be explained by 
our assumption in Section |272| — there exists a certain damp- 
ing coefficient in the real- world information propagation pro- 
cess. The father the manually set number is from the real 
damping value, the worse our model describes the real infor- 
mation propagation, and vice verse. Experimentally, the real 
A located in the range [0.1, 0.4]. 

Figures [5(c)||5(d)| show the running time of Circuit with different 
A (for a better view, we show the results of A =0.1, 0.2, 0.25, 0.3, 
0.4, 0.6, 0.9 and remove the others) on Wiki-Vote, ca-HepPh re- 
spectively. On these figures, we can observe that the running time 
of Circuit is descending with the ascending of A. When the A = 0.6, 
the running times of Circuit are 22.2s, 56. Is respectively, which are 
comparable with PMIA's (10.8s, 56.3s respectively), while the in- 
fluence spread results of Circuit with A = 0.6 are all better than 
PMIA's. From the above observations, we can know that if we 
want to get a better effectiveness, we should set A to be a number in 
[0.1, 0.4] and if we want to get the result efficiently, we should set 
A to be a comparable large value. 

5.4 The Effectiveness of Upper Bound 

To demonstrate the effectiveness of the upper bound proposed 
in Property Q] we take a test in the following two networks: Wiki- 
Vote, ca-HepPh. We first select top- 100 nodes with the highest 
indegree from each network, then compute their importance (Ti = 
fij) an d their corresponding upper bound (1 + /l)!P,(with pa- 
rameter A = 0.25). We show the value of indegree, importance, 
upper bound of the top-100 nodes in Figure|6] where the green line 
is the indegree value, the blue long-dash line is the upper bound 
value, and the red line is the importance value. In Figure [6] we can 
observe that, for each network, the blue line is always very close 
to the red line which demonstrates that (1 + A)Pi is a very com- 
pact upper bound of importance Ti- This is a very important reason 
why Property Q] can help us to reduce the search space to less than 
5% (as illustrated in Table|2}- 

6. RELATED WORK 

Related work can be grouped into two categories. In the first 
category, we describe some existing social influence models. The 
second category includes the existing works for the social influence 
maximization problem. 

Social Influence Models. In the literature, many works about 
social influence have been published. For instance, Anagnostopou- 
los et al. (3) proved the existence of social influence by statistical 
tests. Also, Goyal et al. |9| studied how to learn the true probabil- 
ities of social influence between individuals. In addition, there are 
several models to infer how the influence propagates through the 
network. For example, Granovetter et al. 11 1 01 proposed the Lin- 
ear Threshold(LT) model to describe it, while Goldenberg et al. (8) 




(a) Wiki-Vote. (b) ca-HepPh. 



Figure 6: The upper bound of Authority on four networks: 
Wiki-Vote, NetHEPT, ca-HepPh, and Amazon. 

proposed the Independent Cascade(IC) model. To the best of our 
knowledge, they are two most widely used models. Since these two 
models are not tractable, Kimura et al. 1 12] proposed a comparably 
tractable model SPM and Aggarwal et al. [2| proposed a stochastic 
model to address this issue. In addition, Tang et al. 1201 proposed 
a Topical Affinity Propagation approach, a graphical probabilistic 
model, to describe the topic-based social influence analysis prob- 
lem. Recently, Easley et al. |7| and Aggarwal et al. |T| summarized 
and generalized many existing studies on social influence and some 
other research aspects of social networks. More importantly, they 
demonstrate that by carefully study, the information exploited from 
social influence can be leveraged for dealing with the real-world 
problems (e.g., the problems from markets or social security) ef- 
fectively and efficiently. 

Social Influence Maximization. Domingoes and Richardson 
proposed to exploit social influence for the marketing application, 
which is called viral marketing [6 19], or social influence maxi- 
mization. The goal is to find a set of seeds which will influence the 
maximal number of individuals in the network. 

Kempe et al. formulated the influence maximization problem as 
the discrete optimization problem, and they considered three cas- 
cade models (i.e., IC model, Weighted Cascade(WC) model 1111 . 
and LT model). Also, they proved that the optimization problem 
is NP-hard, and presented a greedy approximation (GA) algorithm 
applicable to all three models, which guarantees that the influence 
spread result is within (1 — 1/e) of the optimal result. To address 
the efficiency issue, Leskovec et al. 1151 presented an optimized 
greedy algorithm, which is referred to as the "Cost-Effective Lazy 
Forward" (CELF) scheme. The CELF optimization uses the sub- 
modularity property of the influence maximization objective to re- 
duce the number of evaluations on the influence spread of nodes. To 
address the scalability issue, Chen et al. proposed several heuris- 
tic methods includes DegreeDiscountIC |5| and PMIA [4| which 
uses local arborescence structures of each individual to approxi- 
mate the social influence propagation. Alternatively, Wang et al. 1211 
presented a community-based greedy algorithm to find the Top- 
K influential nodes. They first detect the communities in social 
network and then find influential nodes from the selected potential 
communities. Among the aforementioned methods, the best algo- 
rithms in seeking the most influential seed set are those based on a 
Monte-Carlo simulations to compute the influence spread for each 
given seed set. The GA, CELF are all this type of algorithms. 

7. CONCLUSION 

In this paper, we developed a social influence model based on 
circuit theory for describing the information propagation in social 
networks. This model is tractable and flexible for understanding 
patterns of information propagation. Under this model, several up- 



per bound properties were identified. These properties can help us 
to quickly locate the nodes to be considered during the information 
propagation process. This can drastically reduce the search space, 
and thus vastly improve the efficiency of measuring the influence 
strength between any pair of nodes. In addition, the circuit the- 
ory based model provides a new way to compute the independent 
influence of nodes and leads to a natural solution to the social influ- 
ence maximization problem. Finally, experimental results showed 
the advantages of the circuit theory based model over the existing 
models in terms of efficiency as well as the effectiveness for mea- 
suring the information propagation in social networks. 

8. REFERENCES 

[I] C. Aggarwal. Social network data analytics. Springer- Verlag 
New York Inc, 2011. 

[2] C. Aggarwal, A. Khan, and X. Yan. On flow authority 

discovery in social networks. 201 1. 
[3] A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence 

and correlation in social networks. In SIGKDD 2008, pages 

7-15. ACM, 2008. 
[4] W. Chen, C. Wang, and Y. Wang. Scalable influence 

maximization for prevalent viral marketing in large-scale 

social networks. In SIGKDD 2010, pages 1029-1038. ACM. 
[5] W. Chen, Y. Wang, and S. Yang. Efficient influence 

maximization in social networks. In SIGKDD 2009, pages 

199-208. ACM, 2009. 
[6] P. Domingos and M. Richardson. Mining the network value of 

customers. In SIGKDD 2001, pages 57-66. ACM, 2001. 
[7] D. Easley and J. Kleinberg. Networks, crowds, and markets: 

Reasoning about a highly connected world. Cambridge Univ 

Pr, 2010. 

[8] J. Goldenberg, B. Libai and E. Muller. Talk of the network: A 
complex systems look at the underlying process of 
word-of-mouth. Marketing letters, 12(3):21 1-223. Springer, 
2001. 

[9] A. Goyal, F. Bonchi, and L. Lakshmanan. Learning influence 

probabilities in social networks. In WSDM 2010, pages 

241-250. ACM, 2010. 
[10] M. Granovetter. Threshold models of collective behavior. 

American journal of sociology, pages 1420-1443. JSTOR, 

1978. 

[II] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the 
spread of influence through a social network. In SIGKDD 
2010, pages 137-146. ACM, 2003. 

[12] M. Kimura, K. Saito. Tractable models for information 
diffusion in social networks. In PKDD 2006, pages 259-271. 
Springer, 2006. 

[13] G. Kirchhoff. Vorlesungen ueber mathematische physik, 

mechanik. Leipzig: Teubner, 1877. 
[14] A. Langville and C. Meyer. Deeper inside pagerank. Internet 

Mathematics, l(3):335-380, 2004. 
[15] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, 

J. VanBriesen, and N. Glance. Cost-effective outbreak 

detection in networks. In SIGKDD 2007, pages 420-429. 

ACM, 2007. 

[16] C. Long, R.C.W. Wong. Minimizing Seed Set for Viral 
Marketing. In ICDM 2011, pages 427-436. IEEE, 201 1. 

[17] R. Narayanam, Y. Narahari. A shapley value-based approach 
to discover influential nodes in social networks. IEEE 
Transactions on Automation Science and Engineering, 
99:1-18, IEEE, 2010. 



[18] L. Page, S. Brin, R. Motwani, and T. Winograd. The 

pagerank citation ranking: Bringing order to the web. 1999. 

[19] M. Richardson and R Domingos. Mining knowledge- sharing 
sites for viral marketing. In SIGKDD 2002, pages 61-70. 
ACM, 2002. 

[20] J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence 
analysis in large-scale networks. In SIGKDD 2009, pages 
807-816. ACM, 2009. 

[21] Y. Wang, G. Cong, G. Song, and K. Xie. Community-based 
greedy algorithm for mining top-k influential nodes in mobile 
social networks. In SIGKDD 2010, pages 1039-1048. ACM, 
2010. 



I Authority 




